The Global Health Expenditure Database (GHED) provides time series data from 2000-2022 which covers country-level spending in the world. Healthcare in the USA is notably more expensive than its counter-parts across the world. This dataset will allow us to examine how different countries allocate their funds towards different segments of healthcare. For example, we may be able to explore how much spending is being allocated towards injuries, for example.
To explore the given dataset, we will look at three subsets in attempts to uncover interesting potential leads. The motivation behind this stems from the large presence of missing values that are in the dataset. We will demonstrate this below. Furthermore, variable names are not intuitive. Creating more meaningful subsets of the larger dataset will allow us to explore potentially interesting questions.
Before delving into analysis, we will focus on the purpose of GHED data. The GHED data is designed to support policy development, comparative analysis, sustainable development goal monitoring, and for advocacy. For instance, the granular data that is available on a standardized world-wide scale, allows UN policy makers to allocate funds more appropriately. Furthermore, officials are able to identify how, for example, aid funds are being spent within a country. This is critical for ensuring that, especially in the future, funds are allocated in the most effective way possible.
We build upon these goals to monitor one specific indicator (OOP costs) in everyday life. However, data is comprised of reports from Ministries of Health, national statistical offices, and other government bodies. Thus, internal bias within the country may influence the data. Regardless, we need to look at how funds are being spent by the average person across different countries in order to improve overall quality of life. While we don’t have costs broken down by household, we believe that OOP costs on the scale of a country serves as the best available proxy in our data for everyday expenses.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)
GHED_data <- read_excel("GHED_data.XLSX", sheet = 1)
# View the structure of the dataset
dim(GHED_data)
## [1] 4244 3923
head(GHED_data)
## # A tibble: 6 × 3,923
## country code region income year che_gdp che_pc_usd hk_gdp hk_g_gdp
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Algeria DZA AFR Lower-middle 2000 3.49 62.1 NA NA
## 2 Algeria DZA AFR Lower-middle 2001 3.84 67.3 NA NA
## 3 Algeria DZA AFR Lower-middle 2002 3.73 66.9 NA NA
## 4 Algeria DZA AFR Lower-middle 2003 3.60 76.2 NA NA
## 5 Algeria DZA AFR Lower-middle 2004 3.54 93.0 NA NA
## 6 Algeria DZA AFR Lower-middle 2005 3.24 101. NA NA
## # ℹ 3,914 more variables: hk_ext_gdp <dbl>, che <dbl>, gghed <dbl>, pvtd <dbl>,
## # ext <dbl>, dom_che <dbl>, gghed_che <dbl>, pvtd_che <dbl>, oops_che <dbl>,
## # vpp_che <dbl>, ext_che <dbl>, gghed_gdp <dbl>, gghed_gge <dbl>,
## # gghed_pc_usd <dbl>, pvtd_pc_usd <dbl>, oop_pc_usd <dbl>, ext_pc_usd <dbl>,
## # tran_shi <dbl>, shise_shi <dbl>, cfa_che <dbl>, gfa_che <dbl>,
## # chi_che <dbl>, shi_che <dbl>, chi_pvt_che <dbl>, vfa_che <dbl>,
## # vhi_che <dbl>, row_che <dbl>, phc_usd_pc <dbl>, phc_che <dbl>, …
We can notice that the size of the datasset is 4,224 x 3,923, representing a large number of both observations and variables. The dataset does not provide much value as it is. Additionally, we can begin to see the missing value problem that was previously mentioned, particularly in columns 8-10.
GHED_data_filtered <- GHED_data %>%
filter(year == 2021)
len <- length(GHED_data)
missing_values <- as.data.frame(colSums(is.na(GHED_data_filtered)))
colnames(missing_values) <- "n"
missing_values$n <- missing_values$n / len
ggplot(missing_values, aes(x = n)) +
geom_histogram() +
labs(title = "Proportion of Missing Values in 2021", x = "proportion")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
In just one year, the proportion of rows that have missing values vary greatly, ranging from 0% - 5%. From here on, we will begin to explore subsets of the data. The following three ideas will be investigated.
To begin this section, we will subset a section of data to use, as well as renaming certain variables to make it more readable and understandable to the viewers. This section is to look into how each countries have spent on specific disease as a percentage of their total health expenditure.
In this section we will take a look at: - Infectious and Parasitic Diseases - HIV/Aids and Sexually transmitted diseases - Tuberculosis - Malaria - Neglected Tropical Diseases - Reproductive Health - Maternal Conditions - Perinatal Conditions - Contraceptive Management - Nutritional Deficiencies - Injuries - Other unspecified diseases/conditions
For this subset we will choose the year of 2019 as the baseline as it represents the most recent snapshot of pre-COVID medical expenses
disease_dataset <- GHED_data %>%
rename(
infectious = dis1_che,
hiv_aids = dis11_che,
tuberculosis = dis12_che,
malaria = dis13_che,
neg_tropical = dis16_che,
reproductive = dis2_che,
maternal = dis21_che,
perinatal = dis22_che,
contraceptive = dis23_che,
nutrition = dis3_che,
noncomm = dis4_che,
injuries = dis5_che,
other = disnec_che
) %>%
select(
country,
region,
year,
hiv_aids,
tuberculosis,
malaria,
neg_tropical,
reproductive,
maternal,
perinatal,
contraceptive,
nutrition,
injuries,
other
)
head(disease_dataset)
## # A tibble: 6 × 14
## country region year hiv_aids tuberculosis malaria neg_tropical reproductive
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Algeria AFR 2000 NA NA NA NA NA
## 2 Algeria AFR 2001 NA NA NA NA NA
## 3 Algeria AFR 2002 NA NA NA NA NA
## 4 Algeria AFR 2003 NA NA NA NA NA
## 5 Algeria AFR 2004 NA NA NA NA NA
## 6 Algeria AFR 2005 NA NA NA NA NA
## # ℹ 6 more variables: maternal <dbl>, perinatal <dbl>, contraceptive <dbl>,
## # nutrition <dbl>, injuries <dbl>, other <dbl>
Looking through the current subset, it seems that there are areas that are missing information or may contain rows that only have limited information. In order to be able to provide a detailed analysis for this section we will need to clean the dataset. We will first take a look at the overall structure of the dataset.
str(disease_dataset)
## tibble [4,244 × 14] (S3: tbl_df/tbl/data.frame)
## $ country : chr [1:4244] "Algeria" "Algeria" "Algeria" "Algeria" ...
## $ region : chr [1:4244] "AFR" "AFR" "AFR" "AFR" ...
## $ year : num [1:4244] 2000 2001 2002 2003 2004 ...
## $ hiv_aids : num [1:4244] NA NA NA NA NA NA NA NA NA NA ...
## $ tuberculosis : num [1:4244] NA NA NA NA NA NA NA NA NA NA ...
## $ malaria : num [1:4244] NA NA NA NA NA NA NA NA NA NA ...
## $ neg_tropical : num [1:4244] NA NA NA NA NA NA NA NA NA NA ...
## $ reproductive : num [1:4244] NA NA NA NA NA NA NA NA NA NA ...
## $ maternal : num [1:4244] NA NA NA NA NA NA NA NA NA NA ...
## $ perinatal : num [1:4244] NA NA NA NA NA NA NA NA NA NA ...
## $ contraceptive: num [1:4244] NA NA NA NA NA NA NA NA NA NA ...
## $ nutrition : num [1:4244] NA NA NA NA NA NA NA NA NA NA ...
## $ injuries : num [1:4244] NA NA NA NA NA NA NA NA NA NA ...
## $ other : num [1:4244] NA NA NA NA NA NA NA NA NA NA ...
Looking through this, we will need to filter out years that we do not need for this section, so to filter through to only 2019 data.
disease_filtered <- subset(disease_dataset, year %in% c(2019))
str(disease_filtered)
## tibble [192 × 14] (S3: tbl_df/tbl/data.frame)
## $ country : chr [1:192] "Algeria" "Angola" "Benin" "Botswana" ...
## $ region : chr [1:192] "AFR" "AFR" "AFR" "AFR" ...
## $ year : num [1:192] 2019 2019 2019 2019 2019 ...
## $ hiv_aids : num [1:192] NA NA NA 20.6 3.77 ...
## $ tuberculosis : num [1:192] NA NA NA 3.386 0.235 ...
## $ malaria : num [1:192] NA NA NA 5.79 15.22 ...
## $ neg_tropical : num [1:192] NA NA NA 1.24 4.88 ...
## $ reproductive : num [1:192] NA NA NA 10 14.5 ...
## $ maternal : num [1:192] NA NA NA 2.96 9.24 ...
## $ perinatal : num [1:192] NA NA NA NA 3.99 ...
## $ contraceptive: num [1:192] NA NA NA NA 1.28 ...
## $ nutrition : num [1:192] NA NA NA 1.739 0.725 ...
## $ injuries : num [1:192] NA NA NA 3.66 1.76 ...
## $ other : num [1:192] NA NA NA 0.0755 33.9561 ...
However, in this subset, there contains rows where there are only few amounts of data, and a few that are missing only a certain amount of data. For this, we will create a quick function to look through the specific columns listec below to check for the ones that we want to keep. We will keep the rows that contain less missing values than actual values that are in the row.
columns_to_check <- c("hiv_aids", "tuberculosis", "malaria", "neg_tropical", "reproductive", "maternal", "perinatal", "contraceptive", "nutrition", "injuries", "other")
clean_disease <- disease_filtered[rowSums(is.na(disease_filtered[columns_to_check])) < length(columns_to_check), ]
Now we want to take a look at the region specific data. We want to know which regions are most prominent in this subset now. From what we can see of this dataset, the most prominent region is the Africa region.
This dataset was collected by the World Health Organization, it has more data available for the Africa region, since they have numerous health initiatives as Africa faces a significant burden of communicable diseases, leading to more focus and more data collection to understand the underlying issues faced by those in the region.
However, the amount of data collected by other regions may be too low, in comparison, which could lead to other questions such as there is a signifcant lack of data for these areas specifically? Could it be a logistics issue or perhaps the government not willing to share the specific datasets.
region_counts_df <- clean_disease %>%
group_by(region) %>%
summarise(Count = n(), .groups = 'drop') %>%
filter(!is.na(region))
print(region_counts_df)
## # A tibble: 6 × 2
## region Count
## <chr> <int>
## 1 AFR 36
## 2 AMR 2
## 3 EMR 2
## 4 EUR 6
## 5 SEAR 1
## 6 WPR 3
# Stacked Bar Chart of AFR Region
ggplot(clean_disease %>% filter(region == "AFR") %>%
pivot_longer(cols = c(hiv_aids, tuberculosis, malaria,
neg_tropical, reproductive, maternal,
perinatal, contraceptive, nutrition,
injuries, other),
names_to = "Metric",
values_to = "Percentage"),
aes(x = country, y = Percentage, fill = Metric)) +
geom_bar(stat = "identity", position = "stack") +
labs(title = "Percentage of Health Metrics by Country (AFR Region)",
x = "Country",
y = "Percentage") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5),
axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "bottom") + # Place legend at the bottom
scale_y_continuous(labels = scales::percent_format(scale = 1))
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_bar()`).
In the Africa region it seems that the highest spending is on malaria, this can be explained by the mosquitos of that region having a longer life-span in comparison to those of other regions. Even though it is a disease that can be treated with care, in a region where it is hard to do so, prevents Malaria from being easily treated.
# Stacked Bar Chart of AMR Region
ggplot(clean_disease %>% filter(region == "AMR") %>%
pivot_longer(cols = c(hiv_aids, tuberculosis, malaria,
neg_tropical, reproductive, maternal,
perinatal, contraceptive, nutrition,
injuries, other),
names_to = "Metric",
values_to = "Percentage"),
aes(x = country, y = Percentage, fill = Metric)) +
geom_bar(stat = "identity", position = "stack") +
labs(title = "Percentage of Health Metrics by Country (AMR Region)",
x = "Country",
y = "Percentage") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5),
axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "bottom") +
scale_y_continuous(labels = scales::percent_format(scale = 1))
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_bar()`).
In the Americas, there is a huge lack of data in this area, which can be
explained by the WHO keeping the United States data seperate from the
data making it hard to analyze both at the same time. However the lack
of data in the South of the Americas is suprising which could stem from
logistics issues or other limiting factors.
# Stacked Bar Chart of EMR Region
ggplot(clean_disease %>% filter(region == "EMR") %>%
pivot_longer(cols = c(hiv_aids, tuberculosis, malaria,
neg_tropical, reproductive, maternal,
perinatal, contraceptive, nutrition,
injuries, other),
names_to = "Metric",
values_to = "Percentage"),
aes(x = country, y = Percentage, fill = Metric)) +
geom_bar(stat = "identity", position = "stack") +
labs(title = "Percentage of Health Metrics by Country (EMR Region)",
x = "Country",
y = "Percentage") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5),
axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "bottom") +
scale_y_continuous(labels = scales::percent_format(scale = 1))
## Warning: Removed 3 rows containing missing values or values outside the scale range
## (`geom_bar()`).
For the EMR region it faces the same issue as the Americas, where there
is a suprising lack of information. This could be stemmed to logistics
issue where there are conflicts in countries inside the region such as
Syria, Yemen, and Afghanistan. This could lead to medical issues in that
region as well as not much access to treatment, making it hard for the
people in those regions to report the data.
# Stacked Bar Chart of EUR Region
ggplot(clean_disease %>% filter(region == "EUR") %>%
pivot_longer(cols = c(hiv_aids, tuberculosis, malaria,
neg_tropical, reproductive, maternal,
perinatal, contraceptive, nutrition,
injuries, other),
names_to = "Metric",
values_to = "Percentage"),
aes(x = country, y = Percentage, fill = Metric)) +
geom_bar(stat = "identity", position = "stack") +
labs(title = "Percentage of Health Metrics by Country (EUR Region)",
x = "Country",
y = "Percentage") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5),
axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "bottom") +
scale_y_continuous(labels = scales::percent_format(scale = 1))
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_bar()`).
For the EUR region there is more readibly avaiable information
elsewhere, perhaps from the Eurpoean Union, which may lead to less
reporting to the WHO. As well as having a good healthcare system,
leading to few reports about some of the issues that other regions may
face.
# Stacked Bar Chart of SEAR Region
ggplot(clean_disease %>% filter(region == "SEAR") %>%
pivot_longer(cols = c(hiv_aids, tuberculosis, malaria,
neg_tropical, reproductive, maternal,
perinatal, contraceptive, nutrition,
injuries, other),
names_to = "Metric",
values_to = "Percentage"),
aes(x = country, y = Percentage, fill = Metric)) +
geom_bar(stat = "identity", position = "stack") +
labs(title = "Percentage of Health Metrics by Country (SEAR Region)",
x = "Country",
y = "Percentage") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5),
axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "bottom") +
scale_y_continuous(labels = scales::percent_format(scale = 1))
In the South East Asian Region, this could stem from health system
imitations: Many countries in SEAR have under-resourced health systems
with limited capacity for comprehensive data collection. This can lead
to inconsistent or incomplete reporting of health metrics, especially in
rural or underserved areas. As well as, a high disease burden: The
region experiences a high burden of infectious diseases (e.g.,
tuberculosis, dengue, malaria) and non-communicable diseases (e.g.,
diabetes, cardiovascular diseases). The focus on addressing immediate
health challenges can deprioritize routine data collection efforts,
affecting data quality.
# Stacked Bar Chart of WPR Region
ggplot(clean_disease %>% filter(region == "WPR") %>%
pivot_longer(cols = c(hiv_aids, tuberculosis, malaria,
neg_tropical, reproductive, maternal,
perinatal, contraceptive, nutrition,
injuries, other),
names_to = "Metric",
values_to = "Percentage"),
aes(x = country, y = Percentage, fill = Metric)) +
geom_bar(stat = "identity", position = "stack") +
labs(title = "Percentage of Health Metrics by Country (WPR Region)",
x = "Country",
y = "Percentage") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5),
axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "bottom") +
scale_y_continuous(labels = scales::percent_format(scale = 1))
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_bar()`).
The WPR region could face issues from a digital divide, some people may
have limited access to digital health technologies, resulting in a
digital divide that can impact data collection quality and
timeliness.
Selecting the revelant columns from the data set for the analysis and removing missing values in these columns
data <- GHED_data
subset_data <- data %>%
select(region, income, year, gghed_che, oops_che, che_gdp, che_pc_usd) %>%
filter(!is.na(gghed_che) & !is.na(oops_che) & !is.na(che_gdp) & !is.na(che_pc_usd))
We group the data by region and income to calculate the mean values of government health expenditure (gghed_che), out-of-pocket expenditure (oops_che), and overall health expenditure (che_gdp and che_pc_usd). This helps us summarize the key variables by region and income level.
summary_stats <- subset_data %>%
group_by(region, income) %>%
summarise(
mean_gghed_che = mean(gghed_che, na.rm = TRUE),
mean_oops_che = mean(oops_che, na.rm = TRUE),
mean_che_gdp = mean(che_gdp, na.rm = TRUE),
mean_che_pc_usd = mean(che_pc_usd, na.rm = TRUE)
)
## `summarise()` has grouped output by 'region'. You can override using the
## `.groups` argument.
# Print the summary stats
print(summary_stats)
## # A tibble: 19 × 6
## # Groups: region [6]
## region income mean_gghed_che mean_oops_che mean_che_gdp mean_che_pc_usd
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 AFR High 74.5 23.3 4.68 552.
## 2 AFR Low 22.7 41.1 5.56 30.1
## 3 AFR Lower-middle 36.1 41.3 4.73 83.5
## 4 AFR Upper-middle 46.1 29.7 5.44 332.
## 5 AMR High 53.2 30.8 7.70 2002.
## 6 AMR Lower-middle 43.7 39.2 6.68 143.
## 7 AMR Upper-middle 54.0 35.4 5.98 388.
## 8 EMR High 74.1 17.2 3.48 1037.
## 9 EMR Low 25.5 65.9 6.53 56.1
## 10 EMR Lower-middle 39.9 48.1 5.06 197.
## 11 EMR Upper-middle 55.6 37.5 5.36 223.
## 12 EUR High 72.7 20.0 8.01 3160.
## 13 EUR Lower-middle 40.6 54.3 6.08 90.3
## 14 EUR Upper-middle 50.9 45.0 6.52 332.
## 15 SEAR Lower-middle 35.4 49.2 3.77 56.9
## 16 SEAR Upper-middle 61.6 26.9 6.15 393.
## 17 WPR High 72.3 15.7 7.91 1986.
## 18 WPR Lower-middle 51.2 23.7 5.68 115.
## 19 WPR Upper-middle 56.9 17.8 8.20 459.
We calculate correlations between:
This step helps to see how government spending or individual costs relate to overall health expenditure.
There is a negative correlation of -0.738 between government expenditure and out-of-pocket expenditure, which suggests that in countries where the government spends more on healthcare, individuals are required to pay less out-of-pocket for medical services and treatments. Essentially, greater government funding reduces the financial burden on citizens.
There is a positive correlation of 0.229 between government expenditure and health expenditure as % of GDP, which suggests the more the government contributes to healthcare, the more the country as a whole tends to spend on health relative to the size of its economy.
There is a negative correlation of -0.351 between out of pocket expenditure and health expenditure as % of GDP suggests that countries or regions with higher health expenditure as a percentage of GDP are likely to have lower out-of-pocket spending by individuals.
correlation_analysis <- subset_data %>%
summarise(
correlation_gghed_oops = cor(gghed_che, oops_che, use = "complete.obs"),
correlation_gghed_che_gdp = cor(gghed_che, che_gdp, use = "complete.obs"),
correlation_oops_che_gdp = cor(oops_che, che_gdp, use = "complete.obs")
)
# Print the correlation analysis
print(correlation_analysis)
## # A tibble: 1 × 3
## correlation_gghed_oops correlation_gghed_che_gdp correlation_oops_che_gdp
## <dbl> <dbl> <dbl>
## 1 -0.739 0.229 -0.351
The scatter plot explores the relationship between government expenditure and out-of-pocket costs across income groups. A linear trend line is added to assess the general trend. Here it shows that government expenditure on health is higher in high income region than low income region, and vice versa.
ggplot(subset_data, aes(x = gghed_che, y = oops_che, color = income)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(
title = "Relationship between Government Expenditure and Out-of-Pocket Expenditure",
x = "Government Expenditure on Health (% of Current Health Expenditure)",
y = "Out-of-Pocket Expenditure (% of Current Health Expenditure)",
color = "Income Level"
) +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
The bar chart shows average health expenditure per capita by income
level and region, providing a clear comparison of spending levels. The
drastic difference in average health expenditure per capita between high
income level region and all other three regions shows a concerning
reality of how wealth plays into an effect on health resource
allocation.
ggplot(summary_stats, aes(x = income, y = mean_che_pc_usd, fill = region)) +
geom_bar(stat = "identity", position = "dodge") +
labs(
title = "Average Health Expenditure per Capita by Income Level and Region",
x = "Income Level",
y = "Average Health Expenditure per Capita (USD)",
fill = "Region"
) +
theme_minimal()
The line chart tracks government health expenditure as a % of current health expenditure over time by income level, allowing us to see trends across different income groups from 2000-2020.
Unsurprinsgly, the high income regions are consistenly high on their expenditure on health at around 80% of CHE. Upper middle regions are experiencing a decrease in expenditure since 2020 and has since fallen to the same expenditure level as the lower middle regions in 2022. The low income regions are struggling to stay above their expenditure level at around 20% of CHE.
These dynamics highlight the need for targeted policies to enhance healthcare financing and infrastructure, particularly in lower-income regions, to ensure equitable access to health services and improve overall health outcomes.
ggplot(subset_data, aes(x = year, y = gghed_che, color = income)) +
geom_line(stat = "summary", fun = "mean") +
labs(
title = "Trend of Government Health Expenditure (% of CHE) by Income Group (2000-2020)",
x = "Year",
y = "Government Expenditure on Health (% of Current Health Expenditure)",
color = "Income Group"
) +
theme_minimal()
The box plot identify detect outliers in out-of-pocket and government health expenditures across income levels and regions. This can help identify patterns and variances that warrant further investigation.
Out of pocket expenditure distributed by income level and region box plot shows that there are a few outliers in EUR and WPR region for the high income level and EMR region for upper middle income level.
ggplot(subset_data, aes(x = income, y = oops_che, fill = region)) +
geom_boxplot() +
labs(
title = "Out-of-Pocket Expenditure Distribution by Income Level and Region",
x = "Income Level",
y = "Out-of-Pocket Expenditure (% of Current Health Expenditure)"
) +
theme_minimal()
Government health expenditure distribution by income level and region
box plot shows there are outliers across all four income level. There
are outliers in EUR region for high income level; outliers in AFR region
for low income level; outliers in AMR region lower middle income level;
outliers in AFR region and WPR region for upper middle income level.
ggplot(subset_data, aes(x = income, y = gghed_che, fill = region)) +
geom_boxplot() +
labs(
title = "Government Health Expenditure Distribution by Income Level and Region",
x = "Income Level",
y = "Government Health Expenditure (% of Current Health Expenditure)"
) +
theme_minimal()
The outliers shows that that they are potentially skewing results or indicating areas requiring targeted policy interventions. For example the outliers seen in high income level from both the boxplots suggests that there is still a need for regulations to investigate high income regions despite already having the the highest government health expenditure as well as the lowest out of pocket expenditure.
To begin analyzing the data, we will select a subset of variables and rename them in order to make downstream analysis easier. Displayed below is a subset of the new dataset which we will use to explore our question. The goal is to look at the proportion of OOP healthcare costs vs. social insurance contributions across different countries.
The scope of this analysis will include all countries with valid data. We will begin with looking at all years, and eventually narrow down to 2019. We choose 2019 as a baseline as it represents the most recent snapshot of pre-COVID medical expenses.
# Rename variables to have intuitive names
OOP_subset <- GHED_data %>%
rename(
current_health_expenditure_pct_gdp = che_gdp,
gdp_per_capita_usd = gdp_pc_usd,
primary_health_care_expenditure_pct_che = phc_che,
total_expenditure = hf,
govt_and_compulsory_financing = hf1,
government_schemes = hf11,
compulsory_contributory_health_insurance = hf12,
social_health_insurance_schemes = hf121,
compulsory_private_insurance_schemes = hf122,
unspecified_compulsory_contributory_insurance = hf12nec,
compulsory_medical_savings_accounts = hf13,
unspecified_govt_compulsory_schemes = hf1nec,
voluntary_health_payment_schemes = hf2,
voluntary_health_insurance_schemes = hf21,
non_profit_institutions_serving_households = hf22,
enterprise_financing_schemes = hf23,
unspecified_voluntary_payment_schemes = hf2nec,
household_out_of_pocket_payments = hf3,
rest_of_world_financing_schemes = hf4
) %>%
select(
country,
region,
year,
income,
total_expenditure,
current_health_expenditure_pct_gdp,
gdp_per_capita_usd,
primary_health_care_expenditure_pct_che,
govt_and_compulsory_financing,
government_schemes,
compulsory_contributory_health_insurance,
social_health_insurance_schemes,
compulsory_private_insurance_schemes,
unspecified_compulsory_contributory_insurance,
compulsory_medical_savings_accounts,
unspecified_govt_compulsory_schemes,
voluntary_health_payment_schemes,
voluntary_health_insurance_schemes,
non_profit_institutions_serving_households,
enterprise_financing_schemes,
unspecified_voluntary_payment_schemes,
household_out_of_pocket_payments,
rest_of_world_financing_schemes
)
head(OOP_subset)
## # A tibble: 6 × 23
## country region year income total_expenditure current_health_expenditu…¹
## <chr> <chr> <dbl> <chr> <dbl> <dbl>
## 1 Algeria AFR 2000 Lower-middle 143870. 3.49
## 2 Algeria AFR 2001 Lower-middle 162231. 3.84
## 3 Algeria AFR 2002 Lower-middle 168702. 3.73
## 4 Algeria AFR 2003 Lower-middle 189137. 3.60
## 5 Algeria AFR 2004 Lower-middle 217929. 3.54
## 6 Algeria AFR 2005 Lower-middle 244643. 3.24
## # ℹ abbreviated name: ¹current_health_expenditure_pct_gdp
## # ℹ 17 more variables: gdp_per_capita_usd <dbl>,
## # primary_health_care_expenditure_pct_che <dbl>,
## # govt_and_compulsory_financing <dbl>, government_schemes <dbl>,
## # compulsory_contributory_health_insurance <dbl>,
## # social_health_insurance_schemes <dbl>,
## # compulsory_private_insurance_schemes <dbl>, …
As it currently stands, the variables we have are not very informative. We will first transform them into percentages. We consider the total sending to be the sum of out of pocket payments, contribution from social insurance, compulsory prepayment, and voluntary prepayment. We will also do some minor pre-cleaning of the data.
head(OOP_subset)
## # A tibble: 6 × 23
## country region year income total_expenditure current_health_expenditu…¹
## <chr> <chr> <dbl> <chr> <dbl> <dbl>
## 1 Algeria AFR 2000 Lower-middle 143870. 3.49
## 2 Algeria AFR 2001 Lower-middle 162231. 3.84
## 3 Algeria AFR 2002 Lower-middle 168702. 3.73
## 4 Algeria AFR 2003 Lower-middle 189137. 3.60
## 5 Algeria AFR 2004 Lower-middle 217929. 3.54
## 6 Algeria AFR 2005 Lower-middle 244643. 3.24
## # ℹ abbreviated name: ¹current_health_expenditure_pct_gdp
## # ℹ 17 more variables: gdp_per_capita_usd <dbl>,
## # primary_health_care_expenditure_pct_che <dbl>,
## # govt_and_compulsory_financing <dbl>, government_schemes <dbl>,
## # compulsory_contributory_health_insurance <dbl>,
## # social_health_insurance_schemes <dbl>,
## # compulsory_private_insurance_schemes <dbl>, …
# Calculate percentages of financing schemes relative to total health expenditure
OOP_subset <- OOP_subset %>%
mutate(
pct_govt_intervention = (govt_and_compulsory_financing / total_expenditure) * 100,
pct_voluntary_protection = (voluntary_health_payment_schemes / total_expenditure) * 100,
pct_household_out_of_pocket = (household_out_of_pocket_payments / total_expenditure) * 100
)
str(OOP_subset)
## tibble [4,244 × 26] (S3: tbl_df/tbl/data.frame)
## $ country : chr [1:4244] "Algeria" "Algeria" "Algeria" "Algeria" ...
## $ region : chr [1:4244] "AFR" "AFR" "AFR" "AFR" ...
## $ year : num [1:4244] 2000 2001 2002 2003 2004 ...
## $ income : chr [1:4244] "Lower-middle" "Lower-middle" "Lower-middle" "Lower-middle" ...
## $ total_expenditure : num [1:4244] 143870 162231 168702 189137 217929 ...
## $ current_health_expenditure_pct_gdp : num [1:4244] 3.49 3.84 3.73 3.6 3.54 ...
## $ gdp_per_capita_usd : num [1:4244] 1780 1755 1795 2117 2625 ...
## $ primary_health_care_expenditure_pct_che : num [1:4244] NA NA NA NA NA NA NA NA NA NA ...
## $ govt_and_compulsory_financing : num [1:4244] 103539 123669 127002 145082 155532 ...
## $ government_schemes : num [1:4244] 66056 81878 83246 95931 102717 ...
## $ compulsory_contributory_health_insurance : num [1:4244] 37483 41791 43756 49152 52815 ...
## $ social_health_insurance_schemes : num [1:4244] 37483 41791 43756 49152 52815 ...
## $ compulsory_private_insurance_schemes : num [1:4244] 0 0 0 0 0 0 0 0 0 0 ...
## $ unspecified_compulsory_contributory_insurance: num [1:4244] NA NA NA NA NA NA NA NA NA NA ...
## $ compulsory_medical_savings_accounts : num [1:4244] 0 0 0 0 0 0 0 0 0 0 ...
## $ unspecified_govt_compulsory_schemes : num [1:4244] NA NA NA NA NA NA NA NA NA NA ...
## $ voluntary_health_payment_schemes : num [1:4244] 3221 3409 3700 4055 5530 ...
## $ voluntary_health_insurance_schemes : num [1:4244] 1177 1375 1610 1855 3150 ...
## $ non_profit_institutions_serving_households : num [1:4244] 80 85 90.5 100 110 ...
## $ enterprise_financing_schemes : num [1:4244] 1964 1949 2000 2100 2270 ...
## $ unspecified_voluntary_payment_schemes : num [1:4244] NA NA NA NA NA NA NA NA NA NA ...
## $ household_out_of_pocket_payments : num [1:4244] 37111 35153 38000 40000 56867 ...
## $ rest_of_world_financing_schemes : num [1:4244] NA NA NA NA NA NA NA NA NA NA ...
## $ pct_govt_intervention : num [1:4244] 72 76.2 75.3 76.7 71.4 ...
## $ pct_voluntary_protection : num [1:4244] 2.24 2.1 2.19 2.14 2.54 ...
## $ pct_household_out_of_pocket : num [1:4244] 25.8 21.7 22.5 21.1 26.1 ...
We want to ensure that there are no missing values in our primary variables. It appears from below that our data in fairly clean.
# Remove rows with missing values in key variables
OOP_subset <- OOP_subset %>%
filter(
!is.na(pct_govt_intervention) &
!is.na(pct_voluntary_protection) &
!is.na(pct_household_out_of_pocket)
)
We will begin by exploring the basic distribution of our data.
ggplot(OOP_subset, aes(x = pct_govt_intervention)) +
geom_histogram(binwidth = 5, fill = 'blue', color = 'black') +
labs(
title = 'Distribution of Government Based Schemes',
x = 'Percentage (%)',
y = 'Frequency'
)
ggplot(OOP_subset, aes(x = pct_household_out_of_pocket)) +
geom_histogram(binwidth = 5, fill = 'red', color = 'black') +
labs(
title = 'Distribution of Household Out-of-Pocket Payments',
x = 'Percentage (%)',
y = 'Frequency'
)
ggplot(OOP_subset, aes(x = pct_voluntary_protection)) +
geom_histogram(binwidth = 5, fill = 'green', color = 'black') +
labs(
title = 'Distribution of Voluntary Payments',
x = 'Percentage (%)',
y = 'Frequency'
)
Now is a good time to explain what exactly each one of these variables really mean.
Firstly, the distribution of government-based schemes shows how much of total healthcare spending is funded through the government. A high percentage here indicates that healthcare programs are more tax-funded as opposed to individually funded.
Secondly, the distribution of out of pocket payments represents the total health care expenditure that is paid directly at the point of service. This means that premiums for insurance or the tax dollars that are paid are not included in this category.
Finally, the distribution of voluntary payments shows the relative amount of health expenditures coming from sources that people opt into. The easiest example is supplemental health insurance where people can choose what plan they want.
Now we will look into some descriptive statistics by year.
# Calculate basic statistical metrics for the new variables grouped by year
yearly_stats <- OOP_subset %>%
group_by(year) %>%
summarise(
mean_govt_intervention = mean(pct_govt_intervention, na.rm = TRUE),
median_govt_intervention = median(pct_govt_intervention, na.rm = TRUE),
sd_govt_intervention = sd(pct_govt_intervention, na.rm = TRUE),
mean_out_of_pocket = mean(pct_household_out_of_pocket, na.rm = TRUE),
median_out_of_pocket = median(pct_household_out_of_pocket, na.rm = TRUE),
sd_out_of_pocket = sd(pct_household_out_of_pocket, na.rm = TRUE),
mean_voluntary_protection = mean(pct_voluntary_protection, na.rm = TRUE),
median_voluntary_protection = median(pct_voluntary_protection, na.rm = TRUE),
sd_voluntary_protection = sd(pct_voluntary_protection, na.rm = TRUE)
)
head(yearly_stats)
## # A tibble: 6 × 10
## year mean_govt_intervention median_govt_intervention sd_govt_intervention
## <dbl> <dbl> <dbl> <dbl>
## 1 2000 52.8 51.1 22.0
## 2 2001 52.6 51.4 21.6
## 3 2002 52.4 52.0 21.9
## 4 2003 52.7 52.3 21.7
## 5 2004 52.2 52.7 21.3
## 6 2005 52.5 52.9 20.7
## # ℹ 6 more variables: mean_out_of_pocket <dbl>, median_out_of_pocket <dbl>,
## # sd_out_of_pocket <dbl>, mean_voluntary_protection <dbl>,
## # median_voluntary_protection <dbl>, sd_voluntary_protection <dbl>
Looking at the sample size for each datapoint, we can see that there is a significant drop at the final time point. This final time point is 2022 with only 20 samples. Thus, we shall remove 2022 from downstream analysis.
# Calculate the number of observations per year, removing rows with NAs
obs_count_per_year <- OOP_subset %>%
group_by(year) %>%
summarise(observation_count = n())
# Plot the number of observations over time
ggplot(obs_count_per_year, aes(x = year, y = observation_count)) +
geom_line(color = "blue", size = 1) +
geom_point(color = "red", size = 2) +
labs(
title = "Number of Observations Over Time",
x = "Year",
y = "Number of Observations"
)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Utilizing these yearly aggregated statistics, let us observe how the
following have changed over time for each one of our variables:
How the median has changed over time. This will indicate relative trends for each of our variables.
How the coefficient of variation has changed over time. This will indicate if countries are either conforming to similar policies or if, instead, they are becoming more disparate.
yearly_stats <- yearly_stats %>%
filter(year != 2022)
ggplot(yearly_stats, aes(x = year)) +
geom_line(aes(y = median_govt_intervention, color = "Govt Intervention"), size = 1) +
geom_line(aes(y = median_out_of_pocket, color = "Out-of-Pocket"), size = 1) +
geom_line(aes(y = median_voluntary_protection, color = "Voluntary Protection"), size = 1) +
labs(
title = "Median of Health Financing Variables Over Time",
x = "Year",
y = "Median (%)",
color = "Variable"
)
yearly_stats$cv_govt_intervention <- yearly_stats$sd_govt_intervention / yearly_stats$mean_govt_intervention
yearly_stats$cv_out_of_pocket <- yearly_stats$sd_out_of_pocket / yearly_stats$mean_out_of_pocket
yearly_stats$cv_voluntary_protection <- yearly_stats$sd_voluntary_protection / yearly_stats$mean_voluntary_protection
ggplot(yearly_stats, aes(x = year)) +
geom_line(aes(y = cv_govt_intervention, color = "Govt Intervention"), size = 1) +
geom_line(aes(y = cv_out_of_pocket, color = "Out-of-Pocket"), size = 1) +
geom_line(aes(y = cv_voluntary_protection, color = "Voluntary Protection"), size = 1) +
labs(
title = "Coefficient of Variation of Health Financing Variables Over Time",
x = "Year",
y = "Coefficient of Variation",
color = "Variable"
)
For the final portion of this analysis, we will look at the relationship between percent spending on government intervention and out of pocket costs. We will also investigate how voluntary protection can protect consumers.
# Scatter plot of Social Health Insurance vs. Out-of-Pocket Payments
ggplot(OOP_subset, aes(x = pct_govt_intervention, y = pct_household_out_of_pocket, color = income)) +
geom_point() +
geom_smooth(method = 'lm', color = 'blue') +
labs(
title = 'Out-of-Pocket Payments vs. Government Intervention',
x = 'Government Intervention (% of Total Health Expenditure)',
y = 'Out-of-Pocket Payments (% of Total Health Expenditure)'
)
## `geom_smooth()` using formula = 'y ~ x'
# Scatter plot of Social Health Insurance vs. Out-of-Pocket Payments
ggplot(OOP_subset, aes(x = pct_govt_intervention, y = pct_voluntary_protection, color = income)) +
geom_point() +
geom_smooth(method = 'lm', color = 'blue') +
labs(
title = 'Voluntary Protection Contribution vs. Government Intervention Contribution',
x = 'Government Intervention (% of Total Health Expenditure)',
y = 'Voluntary Protection Contribution (% of Total Health Expenditure)'
)
## `geom_smooth()` using formula = 'y ~ x'
The analysis above has opened up potentially interesting question for further investigation. Trivially, we have shown that as the magnitude of government spending on healthcare decreases, the amount of out of pocket spending on the behalf of the individual increases. However, we notice that there are some poorer countries that are able to have less government support, but still have out of pocket costs that are comparable to those of higher income countries. Investigating why this is the case may prove fruitful. Furthermore, it seems that some low income countries are able to supplement low government intervention with voluntary protection plans. However, there are others that are unable to. Identifying what characteristics distinguish these two types of countries may also prove to be an interesting further line of inquiry.
# Investigation of how government spending in the previous period impacts
# label minimum year
labeler <- OOP_subset %>%
group_by(country) %>%
summarise(min(year))
colnames(labeler) <- c("country", "year")
labeler$first_year <- TRUE
OOP_subset <- OOP_subset %>%
left_join(labeler, by = c("country", "year"))
gdp_joiner <- GHED_data %>%
select(country, year, gdp)
OOP_subset <- OOP_subset %>%
left_join(gdp_joiner, by = c("country", "year"))
OOP_subset_sorted <- OOP_subset %>%
arrange(country, year)
# lead since we want prev
OOP_subset_sorted$prev_gov_int <- lead(OOP_subset_sorted$pct_govt_intervention)
# compute change in gov int
OOP_subset_sorted$gov_int_change <- OOP_subset_sorted$pct_govt_intervention - OOP_subset_sorted$prev_gov_int
OOP_subset_sorted$prev_oop <- lead(OOP_subset_sorted$household_out_of_pocket_payments)
OOP_subset_sorted$change_oop <- OOP_subset_sorted$household_out_of_pocket_payments - OOP_subset_sorted$prev_oop
OOP_subset_sorted$gdp_oop_ratio <- OOP_subset_sorted$household_out_of_pocket_payments / OOP_subset_sorted$gdp
OOP_subset_sorted$gdp_oop_ratio_difference <- OOP_subset_sorted$change_oop / OOP_subset_sorted$gdp
OOP_subset_sorted <- OOP_subset_sorted %>%
mutate(first_year = if_else(is.na(first_year), FALSE, first_year)) %>%
filter(!first_year)
OOP_subset_sorted %>%
ggplot(aes(x = gov_int_change, y = gdp_oop_ratio_difference)) +
geom_point()
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).
# compute the following
# P(OOP Decreases | Gov Int Increases)
OOP_subset_sorted$pos_gov_change <- OOP_subset_sorted$gov_int_change > 0
OOP_subset_sorted$pos_oop_change <- OOP_subset_sorted$gdp_oop_ratio_difference > 0
proportions <- OOP_subset_sorted %>%
mutate(pos_gov_change = gov_int_change > 0,
neg_oop_change = gdp_oop_ratio_difference < 0) %>%
group_by(pos_gov_change) %>%
summarize(
total = n(),
neg_oop_count = sum(neg_oop_change, na.rm = TRUE),
neg_oop_proportion = mean(neg_oop_change, na.rm = TRUE)
) %>%
drop_na()
contingency_table <- OOP_subset_sorted %>%
mutate(pos_gov_change = gov_int_change > 0,
neg_oop_change = gdp_oop_ratio_difference < 0) %>%
count(pos_gov_change, neg_oop_change) %>%
drop_na() %>%
tidyr::pivot_wider(names_from = neg_oop_change, values_from = n, values_fill = 0)
chisq_test <- chisq.test(as.matrix(contingency_table[,-1]))
OOP_subset_sorted %>%
ggplot(aes(x = pos_gov_change, y = gdp_oop_ratio_difference)) +
geom_point()
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).
proportions_yr <- OOP_subset_sorted %>%
mutate(pos_gov_change = gov_int_change > 0,
neg_oop_change = gdp_oop_ratio_difference < 0) %>%
group_by(pos_gov_change, year) %>%
summarize(
total = n(),
neg_oop_count = sum(neg_oop_change, na.rm = TRUE),
neg_oop_proportion = mean(neg_oop_change, na.rm = TRUE)
) %>%
drop_na()
## `summarise()` has grouped output by 'pos_gov_change'. You can override using
## the `.groups` argument.
proportions_yr %>%
ggplot(aes(x = year, y = neg_oop_proportion, color = pos_gov_change)) +
geom_point() + labs(x = "Year", y = "Negative OOP Change Proportion", color = "Positive Change Gov Spending")
# Perform the regression analysis
model <- lm(pct_household_out_of_pocket ~ pct_govt_intervention, data = OOP_subset)
# Extract coefficients
coeffs <- coef(model)
intercept <- round(coeffs[1], 2)
slope <- round(coeffs[2], 2)
# Format the regression equation as a string
regression_eq <- paste0("y = ", slope, "x + ", intercept)
# Create the plot with annotation
ggplot(OOP_subset, aes(x = pct_govt_intervention, y = pct_household_out_of_pocket, color = income)) +
geom_point() +
geom_smooth(method = 'lm', color = 'blue') +
labs(
title = 'Out-of-Pocket Payments vs. Government Intervention',
x = 'Government Intervention (% of Total Health Expenditure)',
y = 'Out-of-Pocket Payments (% of Total Health Expenditure)'
) +
annotate("text", x = max(OOP_subset$pct_govt_intervention) - 10,
y = max(OOP_subset$pct_household_out_of_pocket) - 10,
label = regression_eq,
color = "black",
size = 5,
hjust = 1)
## `geom_smooth()` using formula = 'y ~ x'
# Perform the regression analysis
model <- lm(pct_voluntary_protection ~ pct_govt_intervention, data = OOP_subset)
# Extract coefficients
coeffs <- coef(model)
intercept <- round(coeffs[1], 2)
slope <- round(coeffs[2], 2)
# Format the regression equation as a string
regression_eq <- paste0("y = ", slope, "x + ", intercept)
# Create the plot with annotation
ggplot(OOP_subset, aes(x = pct_govt_intervention, y = pct_voluntary_protection, color = income)) +
geom_point() +
geom_smooth(method = 'lm', color = 'blue') +
labs(
title = 'Voluntary Protection Contribution vs. Government Intervention Contribution',
x = 'Government Intervention (% of Total Health Expenditure)',
y = 'Voluntary Protection Contribution (% of Total Health Expenditure)'
) +
annotate("text", x = max(OOP_subset$pct_govt_intervention) - 10,
y = max(OOP_subset$pct_voluntary_protection) - 10,
label = regression_eq,
color = "black",
size = 5,
hjust = 1)
## `geom_smooth()` using formula = 'y ~ x'
foreign_sources <- GHED_data %>%
select(country, year, fs7)
OOP_subset <- OOP_subset %>%
left_join(foreign_sources, by = c("year", "country"))
OOP_subset$ratio_outsourcing <- (OOP_subset$fs7 / OOP_subset$total_expenditure) * 100
OOP_subset_lowinc <- OOP_subset %>%
filter(income %in% c("Low", "Lower-middle")) %>%
filter(!is.na(ratio_outsourcing))
# Perform the regression analysis
model <- lm(pct_govt_intervention ~ ratio_outsourcing, data = OOP_subset_lowinc)
# Extract coefficients
coeffs <- coef(model)
intercept <- round(coeffs[1], 2)
slope <- round(coeffs[2], 2)
# Format the regression equation as a string
regression_eq <- paste0("y = ", slope, "x + ", intercept)
# Create the plot with annotation
ggplot(OOP_subset_lowinc, aes(x = ratio_outsourcing, y = pct_govt_intervention, color = income)) +
geom_point() +
geom_smooth(method = 'lm', color = 'blue') +
labs(
title = 'Percent Government Intervention vs. Magnitude of Outsourcing',
x = 'Outsourcing Magnitude (% of Total Health Expenditure)',
y = 'Government Intervention (% of Total Health Expenditure)'
) +
annotate("text", x = max(OOP_subset_lowinc$ratio_outsourcing) - 10,
y = max(OOP_subset_lowinc$pct_household_out_of_pocket) - 10,
label = regression_eq,
color = "black",
size = 5,
hjust = 1)
## `geom_smooth()` using formula = 'y ~ x'
OOP_subset_lowinc <- OOP_subset %>%
filter(income %in% c("Low", "Lower-middle")) %>%
filter(!is.na(ratio_outsourcing))
# Perform the regression analysis
model <- lm(pct_household_out_of_pocket ~ ratio_outsourcing, data = OOP_subset_lowinc)
# Extract coefficients
coeffs <- coef(model)
intercept <- round(coeffs[1], 2)
slope <- round(coeffs[2], 2)
# Format the regression equation as a string
regression_eq <- paste0("y = ", slope, "x + ", intercept)
# Create the plot with annotation
ggplot(OOP_subset_lowinc, aes(x = ratio_outsourcing, y = pct_household_out_of_pocket, color = income)) +
geom_point() +
geom_smooth(method = 'lm', color = 'blue') +
labs(
title = 'Percentage of Household OOP Costs vs. Magnitude of Outsourcing',
x = 'Magnitude of Outsourcing (% of Total Health Expenditure)',
y = 'OOP Costs (% of Total Health Expenditure)'
) +
annotate("text", x = max(OOP_subset_lowinc$ratio_outsourcing) - 10,
y = max(OOP_subset_lowinc$pct_household_out_of_pocket) - 10,
label = regression_eq,
color = "black",
size = 5,
hjust = 1)
## `geom_smooth()` using formula = 'y ~ x'
Vaccine and specific healthcare spending data is hard to come across in this dataset. There are many missing values which makes it difficult to make meaningful comparisons. However, it is interesting to note that we are able to observe how different regions have different predominant diseases.
We then looked into how different country types allocate funds towards healthcare based on region and income level. There were stark differences and an abundance of data present. We were also able to observe outliers which means that there are likely other factors that influence how money is allocated towards healthcare.
We also looked specifically towards how insurance is utilized to offset out of pocket costs in countries with low government healthcare spending. Some low income countries are able to brace healthcare costs utilizing private insurance. However, others are unable to do so. Exploring why this is the case may also prove to be a fruitful line of inquiry.
There are also a few different questions that may be worthwhile to explore. A key question is why some low-income countries exhibit lower out-of-pocket costs despite minimal government spending. Investigating the roles of external funding, voluntary contributions, and cultural or policy factors could help identify replicable strategies for reducing financial burdens in other low-resource settings. This would require merging additional data with our current data. For example, categorizing policy changes would be important in trying to identify effective operating procedures when providing aid to countries. This would be an excellent candidate for future work.
Citations:
World Health Organization (WHO). (2023). Global Health Expenditure Database. World Health Organization. Retrieved from https://apps.who.int/nha/database.
Dieleman, J. L., & Hanlon, M. (2013). MEASURING THE DISPLACEMENT AND REPLACEMENT OF GOVERNMENT HEALTH EXPENDITURE. Health Economics, 23(2), 129–140. https://doi.org/10.1002/hec.3016
Hartman, M., Martin, A. B., Washington, B., Catlin, A., & The National Health Expenditure Accounts Team. (2022). National health care spending in 2020: Growth driven by federal spending in response to the COVID-19 pandemic. Health Affairs, 41(1), 13–25. https://doi.org/10.1377/hlthaff.2021.01763
Morrissey, O. (2015). Aid and government fiscal behavior: Assessing recent evidence. World Development, 69, 98–105. https://doi.org/10.1016/j.worlddev.2013.12.008
Piatti-Fünfkirchen, M., Lindelow, M., & Yoo, K. (2018). What are governments spending on health in east and southern africa? Health Systems & Reform, 4(4), 284–299. https://doi.org/10.1080/23288604.2018.1510287
Adam Wagstaff, Patrick Eozenou, Marc Smitz, Out-of-Pocket Expenditures on Health: A Global Stocktake, The World Bank Research Observer, Volume 35, Issue 2, August 2020, Pages 123–157, https://doi.org/10.1093/wbro/lkz009
Past, present, and future of global health financing: a review of development assistance, government, out-of-pocket, and other private spending on health for 195 countries, 1995–2050
Chang, Angela Y. et al. The Lancet, Volume 393, Issue 10187, 2233 - 2260, https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(19)30841-4/fulltext